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(57) Abstract 

The invention disclosed herein comprises a method of obtaining full-length coding cDNA. The invention method comprises using 
isolated full-length mRNA to synthesize first strand cDNA, attaching a non-native tag to the cDNA and using the tagged first strand to 
synthesize second strand cDNA. The final cDNA can additionally be amplified and inserted into an expression vector. Also disclosed arc 
isolated full-length coding cDNAs prepared using the method of the invention, as well as nucleic acid constructs containing the full-length 
coding cDNAs. 
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METHODS OF OBTAINING FULL-LENGTH NUCLEIC ACID SEQUENCES 
USING E. COZ/TOPOISOMERASE HI AND ITS HOMOLOCS 

Field of ftg Iffvgqtiffn 

The invention disclosed herein relates to the fields of molecular biology and 
5 genomics and methods useful therein. More specifically, the invention relates to 
methods usefiil for the cloning and characterization of nucleic acid sequences 
encoding a full-length protein. 

Packgrppnd pf the ][nvg^ti^p^ 

The fields of biology, agriculture and medicine have been significantly 
10 advanced by the ability to isolate DNA sequences that encode a particular protein, 
determine its sequence and eventually cause this DNA sequence to be expressed in a 
non-native host. This process has allowed potentially useful proteins to be more 
readily studied and has, fiuthermore, made them available as therapeutic molecules 
for the treatment of previously intractable disorders or as tools for the discovery of 
15 new therapeutics. 

Recognizing the enormous value in identifying the DNA sequences that 
encode proteins, several groups have undertaken to sequence the entire genome of 
various organisms, including mice, yeast, several types of bacteria and humans. In 
general, genomic DNA is simply isolated, digested into firagments of a manageable 

20 size and sequenced using an automated sequencing system. Computerized searches of 
the resulting sequence data allow scientists to identify particular stretches of sequence 
that appear to encode a protein. This technique has proved useful when working with 
organisms having a relatively small genome. However, the genomes of complex 
organisms, such as humans, appear to contain a great deal of DNA sequence that does 

25 not encode proteins (introns and the like), thus blanket sequencing approaches may be 
a relatively inefficient means for identifying previously unknown, potentially useful 
proteins. 

Techniques can be used to circumvent the issue of sequencing more DNA than 
one needs to identify new proteins. Typically, the relatively non-abundant messenger 
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RNA (mRNA) is isolated from a population of cells by contacting total isolated RNA 
with a poly-T sequence bound to a solid support under conditions where the 3' poly-A 
tail of the mRNA will bind to the poly-T sequence. Unbound molecules are washed 
away and the mRNA released with a low salt wash. cDNA is produced from the 
5 mRNA by 1) using reverse transcriptase to synthesize first strand DNA, 2) 

hydrolyzing the mRNA template by exposure to base, then 3) synthesizing second 
strand DNA with DNA polymerase. This results in a blunt-ended double-stranded 
sequence, which can then either be inserted into an expression vector directly or first 
tagged with a short linker nucleic acid having a restriction endonuclease recognition 
10 sequence contained therein. 

This method, however, also has significant drawbacks. There is no assurance, 
for example, that the isolated mRNAs represent the full-length sequence encoding a 
protein. Furthermore, blunt-ended ligations require the use of T4 ligase, which is 
relatively slow acting, temperature sensitive and inefficient. Therefore, a need exists 
15 for a robust, straightforward method of obtaining full-length coding sequence that can 
subsequently be easily cloned into a vector for expression. The present invention 
addresses this and related needs. 

Brief Description of the Invention 

The present invention comprises a method of obtaining full-length coding 
20 sequences by, in part, employing the attachment of a non-native tag to a single- 
stranded cDNA product. In particular, the invention method comprises isolating full- 
length mRNA, using the isolated mRNA to synthesize first strand cDNA, attaching a 
non-native tag to the cDNA and using the tagged first strand to synthesize second 
strand cDNA. The final cDNA can additionally be amplified, using the polymerase 
25 chain reaction for example, and inserted into an expression vector. 

A further aspect of the invention comprises isolated full-length coding cDNA 
prepared using the method of the invention, as well as nucleic acid constructs 
containing the full-length coding cDNA. The invention method is efficient and robust 
and provides cDNA products that can be of great benefit to those wishing to obtain 
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and study novel proteins and/or use said proteins as a means to develop new and 
useful substances. 

Brief Description of the Figures 

Figure 1 is a schematic representation showing attachment of a non-native tag 
5 to single-stranded cDNA using coli topoisomerase HI. 

Detailed Description of the Invention 

In accordance with the present invention there are provided methods of 
obtaining full-length coding sequence from any organism. The invention methods 
comprise a series of steps: isolating full-length mRNA, synthesizing first strand 

10 cDNA using the mRNA isolated in the first step as a template, attaching a non-native 
tag to the 3 '-end of the first strand cDNA and synthesizing second strand cDNA using 
the tagged first strand cDNA as a template. The invention niethod is unique in that it 
employs a novel method for the attachment of a non-native tag to a single-stranded 
DNA, in particular a tag comprising a sequence of nucleic acids having additional 

15 utility, such as, for example, providing for convenient cloning activities. 

As used herein, the term "coding sequence" refers to the nucleic acids 
encoding the amino acid sequence of a protein, whereas a "gene sequence" refers to 
the entire nucleic acid sequence that is necessary for the synthesis of a functional 
polypeptide. A gene sequence could include, for example, a promoter in addition to a 
20 coding sequence. The term "cDNA" is used to refer to DNA that is complementary to 
mRNA, that is, unlike genomic DNA, it lacks introns. 

The mRNA used in the method of the invention is typically derived from total 
RNA isolated from a population of cells. Techniques for isolation of total RNA are 
well known in the art. Exemplary techniques are described in Ausubel, et al. Short 
25 Protocols in Molecular Biology, 3rd ed. 1995, which is incorporated by reference 
herein in its entirety. 

Total RNA is a mixed population of transfer RNA (tRNA) and mRNA from 
which the mRNA must be separated prior to initiating cDNA synthesis. The most 
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widely used technique exploits the poly-A tail found on the 5 '-end of mRNA. as 
described above. This technique results in a population of mRNA sequences that 
contains a mixture of full-length and partial mRNAs, which then produces a 
population of cDNAs that will be a mixture of both full-length coding sequences and 
5 partial sequences. A more preferred method useful with eukaryotic cells isolates only 
full-length mRNA sequences utilizing the 5 '-cap structure found on such sequences. 

Most eukaryotic mRNAs are modified at their 5'-ends with a cap structure 
consisting of a 7-methylguanosine in a 5'-to-5' triphosphate linkage to the first 
transcribed nucleotide of the mRNA. This structure is added early during RNA 
1 0 transcription and is required as part of RNA biogenesis (see Sonenberg, N. Prog. 
Nucleic Acid Res. 35:173-207, 1988 for a review). 

The cap structure can be employed to isolate full-length mRNA from other 
RNAs by several methods. In one method, the cap structure of RNA (total or mRNA 
previously isolated by oUgo-dT techniques as described above) is modified by adding 

15 an affinity purification tag such as biotin, chitin binding domain, glutathione-S- 
transferase, and the like to the cap structure (Camici, et al, DNA Res. 4(l):61-66, 
1997). The affinity tagged capped mRNA can then be isolated from degraded mRNA 
or RNAs with poly-A tails that are not full-length. The affinity tagged mRNA is 
separated fi-om imtagged RNA using affinity purification, for example by contacting 

20 the tagged mRNA with an affinity purification material such as a solid support 

complexed with streptavidin, avidin, chitin, glutathione, and the like. Suitable solid 
supports include various column chromatography gels, such as sepharose, agarose, 
cellulose, and the like, and magnetic beads. Chromatography gels can be additionally 
modified with substances such as nickel. 

25 Alternatively, unmodified capped mRNA can be separated from RNA species 

lacking a cap by several methods. For example, the capped mRNA can be contacted 
with a solid support complexed to, for example, phenylboronic acid or 
dihydroxylborate (see Theus and Liarakos, Biotechniques 9(5):610-612, 1990). The 
boronate ligand binds specifically and reversibly to vicinal diols such as the 2',3'-cis- 

30 diol of 7-methylguanosine ribose. The complex is disrupted by acidic pH and a 
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chelating agent such as EDTA, thereby releasing the mRNA from the affinity 
purification material. 

Unmodified capped mRNAs may also be isolated using one or more cap- 
binding proteins bound to a solid support. Cap-binding proteins specifically 
5 recognize and bind to 7-methylguanosine. Several cap-binding proteins have been 
identified and isolated including eIF4E, eIF4A, eIF4G, eIF4B, and eIF4F, which exist 
in isomeric form in some species (see Haghighat and Sonehberg, JBC 272:21677- 
21680, 1997; van Heerden and Browning, J Biol Chem 269:17454-17457, 1994), and 
nCBP (see Ruud, et al, JBC 273(17):10325-10330, 1998). eIF4F is a multi-subunit 
10 complex composed of eIF4E, eIF4A and eIF4G. 

Edery, et al reported a method for isolating capped-mRNA using eI4E bound 
to a solid support (Edery, et al, Mol. Cell. Biol. 15(6):3363-3371, 1995). eI4E alone, 
however, is much less efficient at interacting with capped mRNA compared to 
complexes comprising eIF4E and eIF4G or nCBP and eIF4G. Thus, a preferred 
15 method of isolating capped mRNA comprises using a eIF4E/eIF4G or nCBP/eIF4G 
complex bound to a solid support. Such complexes can be comprised of isolated, 
complexed individual proteins, or alternatively, a fusion protein comprising the cap- 
binding portions of the relevant cap-binding proteins. 

Complexes suitable for use in the present invention can be produced in a 
20 variety of ways. Each of the components of the complex, generally two, can be 
isolated separately from a natural or recombinant source then either sequentially 
immobilized on a solid support or first mixed and allowed to form complexes in 
solution, which are subsequently bound to a solid support. A preferred method of 
forming complexes is to recombinantly produce them in the same cell line, so that 
25 complexes are formed within the cellular environment, purified, then bound to a solid 
support. Affinity purification tags, as described below, can be added to the 
recombinant proteins to aid in purification. 



30 



General procedures used to make fusion proteins are well known in the art. 
Fusion proteins that would be suitable for use in the practice of the present invention 
include a fiision between the C-terminus of fiiU-length eI4E and the N-terminus of 
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<U„-,eng,h <^40. wiU, „, wi.hou, U,= addition of an affinity purifieaiion .ag. or U,c N- 

..nn.nusoffn,Weng«,e,4Ea..ed.omcC-.e™i„usof«.,Wc„g,he,4G.w-U,or 
Without an affmity purification tag. 

, .Tr" °™ " " "^-^ - ■""'pacing ,he 

5 of«,= mve„don including bo.l.cutayo.iccelk and prokao^tfc cells 

although only euka^-otic cells (such as plan, animal or insect cells) can be uied as a 

sou„=eformRNAisola,cdbythecap-bindingp„>ceduredescribedabove. Suitable 
3n,n,al cells include ntanunalian cells (human, rodent, non-human primate goa. 
sheep, cow, and the like, and insect celU (moth. Drosophila, and the like). Cells' in 

10 ™>' -Se ofdifl-ere„tia,ion are suitable as a source for mRNA. as are cells in any 

parttcular activation state. Methods of extr^ting mRNA from different cell types a„ 
well known m the an (see. for example. Ausubel. et al. supra). 

The isolated mRNA (either eukaryotic or prokaryotic) is used as a template for 
*e»sor,he«rs..oranti-sense.strandofthecONA. Methods of first land 
cDNAsynthests are well known in the art. CSenerally. a poly(dT).containing 
ohgonucleotide is used as a primer for synthesis by a revere transcriptase enzyme 

Fns. strand synthesis technicues a,, described in detail in Ausubel, s„p.». and in the 

examples below. 

20 mRNA "^^-^^^ ""-"'^ - ' -bstance that will degrade the 

mRNA, such as, for example. NaOH. yielding single-stnmded first strand cDNA In 
an al.en,ative embodiment, prior to degradation of the entire mRNA sh^d the ' 

nUWA/oDNA hybrid may first be treated with a subsUmce that degrades single- 
st^ded mRNA. such as, for example, RNAse. In this embodiment, any of the 5-- 
^ capped ends of any mRNA/cDNA hybrids whetein the cDNA is no, fttll-length will 
25 beremoved. Alter RNAse treatment, the treated hybrids are subjected to a cap 

whtch the first strand cDNA is not full-length. The isolated full-length RNAse- 
treated hybrids arc then treated to degrade the mRNA. as described above. 

next attached to the first strand CDNA Scheele 
30 (US Patent 5.,«.0«, issued .1.0..) describes a method wherein a homop!:;! 
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oligonucleotide tag is added to the 3* end of first strand cDNA using an enzyme such 
as terminal deoxyribonucleotidyl transferase, polyA polymerase, and the like. The 
resulting DNA either possesses a persistent homopolymeric oligonucleotide tag with 
no utility or, using the method described by Scheele, the tag is removed and the 
5 resulting cDNA is left with blunt ends, which reduces later cloning efficiency. An 
altemative method of adding a non-native tag sequence to single-stranded DNA uses 
T4 RNA ligase (see Tessier, et aL, Anal Biochem 158:174-178, 1986; McCoy and 
Gumport, Biochemistry 19:635-642, 1980). This enzyme is, however, relatively slow 
and inefficient and does not work efficiently with any sequence (see Harada and 
10 Orgel, Proc. Natl. Acad. Sci. USA 90:1576-1579, 1993), 

hi contrast to the tags employed by Scheele, the non-native tags preferably 
employed in the invention method are nucleic acids which have some utility, such as 
comprising an enzyme recognition sequence that can be employed in subsequent 
cloning and/or insertion into a nucleic acid construct. 

15 In accordance with the present invention, a non-native tag sequence is attached 

to the 3*-end of the first strand cDNA by a type I topoisomerase such as E. coli 
topoisomerase III, yeast topoisomerase III, and the like (see DiGate and Marrrians, 
JBC 264:17924-17930, 1989; Kim and Wang, JBC 267:17178-17185, 1992). These 
enzymes play a key role in DNA metabolism. Type I topoisomerases recognize and 

20 bind to specific recognition sequences in single-stranded DNA, catalyze a strand 
break and strand passage event, then reseal the break. An enzyme-DNA adduct is 
formed at the point of DNA cleavage, with the enzyme covalcntly bound to the 
nucleotide 5* to the cleavage site, 

A preferred enzyme for use in the invention method is E, coli topoisomerase 
25 m (topo III). Topo in is a type I topoisomerase which recognizes, binds to and 

cleaves the single-stranded sequence 5'-GCAACTT-3* (SEQ. ID NO:l)(Zhang, et al, 
J. Biol. Chem. 270(40):23700-23705, 1995). A homolog, the traE protein of plasmid 
RP4, has been described by Li, et aL, J. Biol. Chem 272:19582-19587, 1997. A 
DNA-protein adduct is formed with the enzyme covalently binding to the 5'- 
30 thymidine residue, with cleavage occurring between the two thymidines. 
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The no„.„ativc ,ag c^ be p„pa™, incubating a «„g,e-«™ded „„e,.ic 
ac,d „,o,.cu.e w..h ,„p„ „, under oo„d,-.i„„. .uhab.e for .name's acUvity 

.opo ,n recog„,„o„ an, 3^„,„^ , 

pnm.ng .ec„«d .n^d synfte^is a.d/or in subs^juent cloning ac.ivife, such as for 
example, a recognition .e,uence for a resrtcion endonuCease or a ^co^binasi 

wh.chrecog„,zes and acts on double-s^andedDNA sequences such as Cre Flpor 
vaee.n.a .opoisomerase. Meftods of cloning using vaccina .opoisomerase for 
«.an,p e are described in U.S. Paten, No. 5.765.89. (Shun,a„. issued June 6. ,99S) 
10 wh'ch IS mcon.ora,ed by reference herein in ilsenlirecy. 

. -n-e enzyme wil, bind .o and cleave ,he nucleic acid molecule a, d,e enzyme's 

rr'T t '—"-^ ^ -Zy 
nked .o^e BNA 3- .o cleavage si.e via an enzyme-bridged phospho.y™si„e ' 

.5 17: r^"''"r'"'''"'"''''^""''=^^=--^--^-«'-°~n. 

m hods, for example .eam,en. wi.h de.rgen.s ,o temporarily inactivate the enzyme 
folWed by punficationofthe complex by standard Chromatography or 
elec^ophoresistechnicues. both ofwhich are well kno™ in the art. Alternatively the 
Utg-fo^atton reaction can be perfonned while one end of the uncleaved molecule is 

^"-""-'-'-^ve.thepre-cleavednucleicacidmoleculecanbedesigned 
to contatn a bridging phospho^thioate between the two thymidines of the GCAACTT 
cl^cognidon se,uence. When cleaved, the clipped piece wil, contain a 3--SH 

I" "-""itional embodiment, the 0NA "tag" can be a nucleic acid vector 
ntfter Utan a shorter nucleic acid molecule. Suitable vectors can he prepared as ' 

onow^. A-'-acidmolecu.es„itab,eforat.achn,e„„oavec.orca„becreatedby 
30 -■"^'--^ ^'"^'e =—s of ONA which when annealed leave a 

30 stngle-stranded overhang containing a topoisomera. in recogniUon/cleavage 
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sequence. The double stranded end of the attachment molecule can either be blunt or 
contain a restriction endonuclease cutting sequence. A vector (as described below) 
can be linearized using a restriction endonuclease in such a way that the 5' end will 
match that of the attachment molecule. The attachment molecule is similarly treated, 
5 then ligated to the vector, forming a vector with a single stranded overhang containing 
the topoisomerase III recognition site. The modified vector is treated as described 
above to create an enzymervector adduct. 

The topo nirtag adduct is incubated with the isolated first strand cDNA under 
conditions such that the enzyme will catalyze the joining reaction between the 3*-OH 
10 of the cDNA and the 5' end of the tag. Suitable conditions are discussed in greater 
detail in the Examples below. 

Once the first strand cDNA and tag have been joined, the tagged cDNA is 
used as a template to synthesize second strand cDNA using the tagged first strand as a 
template. Methods of producing second strand cDNA are well known in the art (see, 

15 Ausubel, supra). One particularly usefiil method is the polymerase chain reaction 
(PGR), a technique well known in the art. The use of PGR has the advantage of 
amplifying the number of copies of any particular DNA sequence in addition to the 
creation of double-stranded DNAs. Nucleic acids complementary to the non-native 
tag and either a poly-dT or a nucleic acid complementary to the 5 ' end of the DNA 

20 can be used as primers for strand extension. If the first strand DNA is joined to a 
vector, DNA polymerase can be used to fill in the single stranded portions, creating 
blunt ends which can be ligated with T4 DNA ligase to create circular and 
transformable DNA according to techniques well known in the art. 

In general, the procedure described above will efficiently produce fiill-length 
25 protein-encoding nucleic acids. The proteins encoded thereby can then be produced 
by inserting the newly cloned nucleic acid into an expression vector, which is 
subsequently transfected into a host cell for protein expression. 

The amplified double-stranded DNAs can be isolated fi-om the other 
components of the amplification reaction mixture prior to insertion into an expression 
30 vector. This purification can be accomplished using a variety of methodologies such 
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as column chromatography, gel electrophoresis, and the like. One method of 
purification utilizes Jow-melt agarose gel electrophoresis. The reaction mixture is 
separated and visualized by ethidium bromide staining. DNA bands that represent a 
majority of the amplification products are cut away from the rest of the gel and placed 
5 into appropriate corresponding wells of a 96-well microtiter plate. These plugs are 
subsequently melted and the DNA contained therein utilized as cloning inserts. The 
use of gel electrophoresis has the advantage that the practitioner can simultaneously 
purify the desired nucleic acid and verify that the sequence is of a reasonable size, i.e., 
probably represents the entire desired coding sequence. 

10 The purified coding sequence can then be inserted into an expression vector. 

A variety of expression vectors are suitable for use in the method of the invention, 
both for prokaryotic expression and eukaryotic expression. In general, the expression 
vector will have one or more of the following features: a promoter-enhancer, a 
selection marker, an origin of replication, an affinity piu-ification tag, an inducible 

15 element, an epitope-tag, and the like. 

Promoter-enhancers are DNA sequences to which RNA polymerase binds and 
initiates transcription. The promoter determines the polarity of the transcript by 
specifying which strand will be transcribed. Bacterial promoters consist of -35 and - 
10 (relative to the transcriptional start) consensus sequences which are bound by a 

20 specific sigma factor and RNA polymerase. Eukaryotic promoters are more complex. 
Most promoters utilized in expression vectors are transcribed by RNA polymerase IL 
General transcription factors (GTFs) first bind specific sequences near the start and 
then recrait the binding of RNA polymerase 11. In addition to these minimal promoter 
elements, small sequence elements are recognized specifically by modular DNA- 

25 binding/trans-activating proteins (e.g. AP- 1 , SP- 1 ) which regulate the activity of a 
given promoter. Viral promoters serve the same function as bacterial or eukaryotic 
promoters and either provide a specific RNA polymerase in trans (bacteriophage T7) 
or recruit cellular factors and RNA polymerase (S V40, RSV, CMV). Viral promoters 
are preferred as they are generally particularly strong promoters. 



wo 00/56878 



PCT/USOO/06560 



-11- 

Promoters may be, furthermore, either constitutive or, more preferably, 
regulatable (i.e., inducible or derepressible). Inducible elements are DNA sequence 
elements which act in conjunction with promoters and bind either repressors (e.g. 
lacO/LAC Iq repressor system in E. coli) or inducers (e.g. gall/GAL4 inducer system 
5 in yeast). , In either case, transcription is virtually "shut off' until the promoter is 
derepressed or induced, at which point transcription is "tumed-on". 

Examples of constitutive promoters include the int promoter of bacteriophage 
A., the bla promoter of the p-lactamase gene sequence of pBR322, the CAT promoter 
of the chloramphenicol acetyl transferase gene sequence of pPR325, and the like. 

10 Examples of inducible prokaryotic promoters include the major right and left 
promoters of bacteriophage (Pl and Pr), the trp, reca, lacZ, Lad, AraC and gal 
promoters otE. coli, the a-amylase (IJlmanen Ett at., J. Bacteriol. 162:176-182, 1985) 
and the sigma-28-specific promoters ofB, subtilis (Oilman et al.. Gene sequence 
32:11-20, 1984), the promoters of the bacteriophages oi Bacillus (Gryczan, In: The 

1 5 Molecular Biology of the Bacilli, Academic Press, Inc., NY (1 982)), Streptomyces 
promoters (Ward et at., Mol. Gen. Genet. 203:468-478, 1986), and the like. 
Exemplary prokaryotic promoters are reviewed by Glick (J. Ind. Microtiot. 1:277- 
282, 1987); Cenatiempo (Biochimie 68:505-516, 1986); and Gottesman (Ann. Rev. 
Genet. 18:415-442, 1984). 

20 Preferred eukaryotic promoters include, for example, the promoter of the 

mouse metallothionein I gene sequence (Hamer et al., J. Mol. Appl. Gen. 1:273-288, 
1982); the TK promoter of Herpes virus (McKnight, Cell 31:355-365, 1982); the 
SV40 early promoter (Benoist et al.. Nature (London) 290:304-310, 1981); the yeast 
gal4 gene sequence promoter (Johnston et al., Proc. Natl. Acad. Sci. (USA) 

25 79:6971-6975, 1982); Silver et al., Proc. Natl. Acad. Sci. (USA) 81:5951-5955, 

1984), the CMV promoter, the EF-1 promoter, ecdysone-responsive promoter, and the 
like. 

Selection markers arc valuable elements in expression vectors as they provide 
a means to allow only those cells which contain a vector to grow. Such markers are 
30 of two types: drug resistance and auxotrophic. A drug resistance marker enables 
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. •^V'*" (usefiilm mammalian 
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Add..onal Cement d,a. can be inciuded in an expression vecor employed in 
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(LaVallie, et al. Bio/Technology 11 :187-193, 1993) (which binds to metal-chelating 
resins), and the like can also be used for facilitating purification of the protein of 
interest. The affinity purification tag can be separated from the protein of interest by 
methods well known in the art, including the use of inteins (protein self-splicing 
5 elements, Chong, et al. Gene 192:271-281, 1997). 

Epitope tags are short peptides that are recognized by epitope specific 
antibodies. A fusion protein comprising a recombinant protein and an epitope tag can 
be simply and easily purified using an antibody bound to a chromatography resin. 
The presence of the epitope tag furthermore allows the recombinant protein to be 
10 detected in subsequent assays, such as Western blots, without having to produce an 
antibody specific for the recombinant protein itself. Examples of commonly used 
epitope tags include V5, glutathione-S-transferase (GST), hemaglutinm (HA), the 
peptide Phe-His-His-Thr-Thr (SEQ ID NO:2), chitin binding domain, and the like. 

A further useful element in an expression vector is a multiple cloning site or 
15 poly linker. Synthetic DNA encoding a series of restriction endonuclease recognition 
sites is inserted into a plasmid vector downstream of the promoter element. These 
sites are engineered for convenient cloning of DNA into the vector at a specific 
position. 

The foregoing elements can be combined to produce expression vectors useful 
20 in expression of the coding sequences created using the invention method. Suitable 
prokaryotic vectors include plasmids such as those capable of replication in E. coli 
(for example, pBR322, ColEl, pSClOl, PACYC 184, itVX, pRSET, pBAD 
(Invitrogcn, Carlsbad, CA), and the like). Such plasmids are disclosed by Sambrook 
(cf. "Molecular Cloning: A Laboratory Manual", second edition, edited by 
25 Sambrook, Fritsch, & Maniatis, Cold Spring Harbor Laboratory, (1989) the relevant 
sections of which are incorporated by reference herein). Bacillus plasmids include 
pC194, pC221, pT127, and the Uke, and are disclosed by Gryczan (In: The Molecular 
Biology of the Bacilli, Academic Press, NY (1982), pp, 307-329), Suitable 
Streptomyces plasmids include plJlOl (Kendall et al., J. Bacteriol. 169:4177- 
30 4183,1987), and Streptomyces bacteriophages such as (j)C31 (Chater et al.. In: Sixth 
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ta.en,a.i„„a. Sy„,po.i„:„ o„ AcUno^yceWes Biology, Atade^ai Kaido. Budapcs, 

I77n TI- "''"'"-'"^ "'^'^ « ^- (Rev 

Infect. D,a. 8:693-704. .986). ami Izaki (Jpa /. BaCeriol. 33:729-742. 1978). 

'*'''° ''*'^°a=P'a™id^ include, for example. BPV. vaccina. SV40 2: 
n^c^cWe. PCONA3.I. pCDNA3.UGS. pYES2/0S. pMT. pIND. prnDCSp,)'. 
pVgRXR (InvTogen). and d,e like. or tt,eir derivatives. Suchplasmida a„ well 

•tT, '"^^^ " S^P- '9:265-274. ,982; Broach. 

The MoUcu^ B.o,ogy of the Yeas. W=«,„^.«: Ufa Cycle and Inheritance" 
Cold Spnng Harhor Laboratoty. Cold Spring Harbor. NY. p. 445-470. . 98 1 ■ B»ach 
• Cell 28:203-204. 1982; Dilon et at.. Clin. Hepatol. Oncol. ,0:39-4; mo 
Mani,.is. In: Cell Biology: A Comprehensive Treatise. Vol. 3. Gene S«,nenc; 
Expression. Academic Press. NY, pp. 563-608.1980). 

^=-=°<«-'8^^<l''™:« can be inserted into an exptession vector using any Of, 
vane^ of techniques. For example, if the non-native tag added .0 the cDNA contains 
a restnctton endonuclease recogniUon scuence, the coding se,uence can be ligatj 
mto a compa^ble sequence existmg in the desired expression vector using standan, 
eechmques. A patticularly pref«red method is the use of the vaccinia topoisomerase 
clomng system described in US Patent No. 5,766.891. issued June 16. ,998 ,0 S 
Shuman and available as a cloning kit from Invitrogen. Corp.. Carlsbad. CA. ' 

Prokatyotic hosts are. generaUy. very efficient and convenient for the 
production of recombinant p™,ei„s and are. tbet^fore. one type of prefetred 
«p.ession system. Prokaryotes most frequently a,. represe„,e,,,,^„^^^ 
Kco,. However, other organisms may also be used, including other bacterial strains. 

c!Zr '"^ «•= "P«- and 

ontro, sequenc^ m the expression plasmid. Recognised prokaryotic hosts include 
bact^a such as K coH and those from geneta such as Bacm^. s.r.„o.nyces 
P^u^^onas ^oW/,., Serrau.. and the like. However, under such conditions 
the polypeptide will not be glycosylated. 

Suitable hosts may often include eukatyotic cells. Preferred euka.yo.ic hosts 
.nclude, for example, yeast, fungi, insect cells, and mammalian cells either ,„ v,vo. or 
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in tissue culture. Mammalian cells which may be useful as hosts include HeLa cells, 
cells of fibroblast origin such as VERO, 3T3 or CHOKl, HEK 293 cells or cells of 
lymphoid origin (such as 3 2D cells), and their derivatives. Preferred mammalian host 
cells include non-adherent cells such as CHO, 32D, and the like. Preferred yeast host 
5 cells include 51 pombe^ S. cerevisiae (such as INVScl), and the like. 

Li addition, plant cells are also available as hosts, and control sequences 
compatible with plant cells are available, such as the cauliflower mosaic virus 35S 
and 19S, nopaline synthase promoter and polyadenylation signal sequences, and the 
like. Another preferred host is an insect cell, for example the Drosophila larvae. 

10 Using insect cells as hosts, the Drosophila alcohol dehydrogenase promoter can be 
used, (Rubin, Science 240:1453-1459, 1988). Alternatively, baculovirus vectors can 
be engineered to express large amounts of peptide encoded by a desired gene 
sequence in insect cells (Jasny, Science 238:1653, 1987); Miller et aL, In: Genetic 
Engineering (1986), Setlow, J.K., et aL, eds., Pleniun, Vol. 8, pp. 277-297). The 

1 5 present invention also features the purified, isolated or enriched versions of the 
expressed coding sequence products produced by the methods described above. 

The invention will now be described in greater detail by reference to the 
following non-limiting example. 

20 Full-length mRNA is isolated firom a population of cells by first isolating total 

RNA using a commercially available kit, RNA Isolation Reagent (RNAwiz, Ambion) 
according to the manufacturer's instmctions. 

To prepare cap-binding protein affinity material, the 560 bp encoded wheat 
genh eIF-4E protein (the p86 subunit, Van Heerden and Browning, supra) is cloned 
25 into a vector such as pCR4 (Invitrogen), according to the manufacturer's instructions, 
to produce a Thio-Patch fiision protein. A chitin binding domain encoding sequence 
is also cloned upstream of the Thio-Patch site (pCR4), so that the resulting expressed 
eIF-4E protein contains both a chitin binding domain and Thio-Patch at the N- 
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terminus, A stop codon is introduced at the end of the insert so that the coding 
sequence will not read-through to the V5 epitope contained on the vector. 

Vector transformed E. coli are grown and protein expression induced 
according to the manufacturer 's instructions. Lysates are prepared by resuspending a 
liter of pelleted cells in Buffer A lysis solution (25 - 30 ml. Buffer A (1 0 mM tris, pH 
7.5. 0.2 mM EDTA. 1 mM DTT. 10% glycerol. 100 mM KCl) plus 0.5% Triton X- 
100. 1 mM PMSF (phenylmethylsulfonyl fluoride)), repeated sonication, then 
centrifugation at 12.000 rpm for 15 minutes. The cleared lysate is removed from the 
pelleted cell debris and 1 mM PMSF added. 

The protein is purified using m^GTP Sepharose (Phannacia). The resin is 
equilibrated with Buffer A. then added to the cleared lysate and stirred slowly 
overnight at 4°C. The resin is pelleted by centrifugation and the liquid removed. The 
resin is resuspended in Buffer A. washed by agitation at room temperature for 30 
minutes, then re-pelleted. After removal of the liquid, the washed resin is 
resuspended in Buffer A and loaded into an empty column, allowing the Buffer A to 
drain through. The column is continuously washed with Buffer A until no protein is 
detected in the flow-through. The cap binding protein is eluted with 70 mM m'GTP in 
Buffer A. 

The isolated protein is then bound to a solid support such as Ni-ProBond 
Chitin Beads (New England BioLabs). used according to the manufacturer's 
instructions. Total RNA is added to the resin in a binding buffer containing 10 mM 
KHP04(pH8). 100 mM KCI. 5%glycerol. 2mM EDTA. and 6 mM dithiothreitol 
(DTT) (added just before using) and supplemented with 1.3% polyvinyl alcohol 
(PVA). The RNA is mixed with the resin on an orbital shaker for approximately 1 
hour at room temperature. The resin is then washed with binding buffer without PVA 
3-4 times to remove unbound mRNA. After the final wash is removed. 600 ^ of 
lOmM Tris (pH8) is added to the resin. The bound mRNA is eluted using an equal 
volume of phenol/chloroform. pH8. Following elution. the mRNA is precipitated 
with 0.4M NaOAc and an equal volume of isopropanol at -70OC for at least 30 
minutes, then spun down in a microfnge to collect the RNA pellet. lO^g of mussel 
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glycogen is added as a carrier to aid in precipitation. The mRNA pellet is 
resuspended in 20^{ RNAse-free water. 

The isolated mRNA is converted to first-strand cDNA using a cDNA Cycle® 
Kit (Invitrogen) using the oligo dT primer provided and the protocols suggested by 
5 the manufacturer. The RNA is hydrolyzed by exposure to 0.2 N NaOH at 37"* C for 
30 minutes. The resulting single-stranded cDNA is tagged by the addition of 
previously prepared tag covalently bound to topoisomerase HI (see below) in Tris 
pH8.0, ImM Mg acetate, O.ljig/ml BSA, NaCl for 5 minutes at 37°C. 

Topoisomerase IIIiDNA tags can be prepared by incubating a sequence such 
10 as, for example, 5'-]SnSfNGCAACT*TCCCTATAGTGAGTCGTATTA-3' (SEQ ID 
NO:3) with£. coli topoisomerase III in 40 mM Hepes-KOH (pH 8.0 at 22°C), 0.1 
mg/ml bovine semm albiunin, and 1 mM magnesium acetate (pH 7.0) at 37°C for 3 - 
5 minutes (where * marks the cleavage point). The reaction can be stopped by the 
addition of SDS or another mild detergent to a final concentration of 2%. The 
1 5 covalent enzyme:DNA adduct is isolated using colimm chromatography, for example 
using a size exclusion resin and HPLC. 

PGR can be performed using the cDNA Cycle Kit according to the 
manufacturer's instructions and primers corresponding to the 3' end of the tag 
sequence and oligo-dT. The reaction cycles can be as follows: 2 minutes at 94®C, 
20 then 25 - 35 cycles (10 sec/cycle) at 94°C, 55**C and 72*^0, followed by 5 minutes at 
72°C. The resulting amplified cDNA can be inserted into a plasmid vector such as 
pCR®2 or pCR®2-Topo™ (Invitrogen, Carlsbad, CA) according to the 
manufacturer's instructions. 

While the foregoing has been presented with reference to particular 
25 embodiments of the invention, it will be appreciated by those skilled in the art that 
changes in these embodiments may be made without departing from the principles 
and spirit of the invention, the scope of which is defined by the appended claims. 
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That which is claimed is: 

1 . A method of producing fiill-length coding sequences, said method 
comprising: 

(a) synthesizing first strand cDNA using isolated full-length mRNA 
fi-om a population of cells as a template, thereby forming first strand 
cDNA/mRNA hybrid(s); 

(b) denaturing the first strand cDNA/mRNAhybrid(s); 

(c) attaching anon-native tag sequence to the 3' end of the first strand 
cDNA;and 

(d) producing a full-length double-stranded cDNA by synthesizing 
second strand cDNA using the tagged first strand cDNA produced in step (c). 

2. A method according to claiml wherein the mRNA is isolated 
employing an affinity purification material. 

3. A method according to claim 2 wherein the affinity purification 
material comprises one or more cap-binding proteins bound to a solid surface. 

4. A method according to claim 3 wherein the cap-binding protein(s) are 
selected from the group consisting of eIF4E. eIF4G. eIF4F. eIF4G, nCBP. and 
eIF4E:eIF4G fiision protein. 

5. A method according to claim 2 wherein the mRNA to be isolated 
compnses a biotinylated cap structiu^e, 

6. A method according to claim 5 wherein the affinity purification 
matenal is a streptavidin or avidin-complexed solid support. 

7. A method according to claim 1 wherein the mRNA is de-capped and 
de-phosphorylated after isolation. 
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8. A method according to claim 1 wherein the tag sequence comprises a 
recognition site for a site-specific recombinase. 

9. A method according to claim 8 wherein the tag sequence further 
comprises a recognition site for a site-specific restriction endonuclease. 

10. A method according to claim 1 wherein the tag sequence is attached by 
a site-specific recombinase capable of recognizing and acting on single stranded 
DNA. 

11. A method according to claim 1 0 wherein the site-specific recombinase 
is E, coli topoisomerase EI. 

12. A method according to claim 1 further comprising amplifying the 
cDNA during or after step (d). 

13. A method according to claim 12 further comprising inserting the 
amplified cDNA into an expression vector. 

14. A method according to claim 1 further comprising treating the first 
strand cDNA/mRNA hybrid(s) fomied in step (a) with a substance that degrades 
single stranded RNA; and isolating the undegraded hybrid(s) with an affinity 
purification material having affinity for capped mRNA prior to performmg step (b). 

15. A method according to claim 14 wherein the substance is RNAse L 

16. A method according to claim 14 wherein the affinity purification 
material comprises one or more cap-binding proteins boimd to a solid support. 

17. A method according to claim 14 wherein the mRNA component of the 
cDNA/mRNA hybrid comprises a biotinylated cap structure. 
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18. A method according to claim 1 7 wherein the affinity purification 
material is a streptavidin or avidiri-complexed solid support. 

19. A method according to claim 14 further comprising inserting the 
double stranded cDNA resulting from step (d) into an expression vector, 

20. An isolated full-length coding sequence prepared according to the 
method of claim L 

21. An expression vector comprising an isolated full-length coding 
sequence prepared according to the method of claim 1 . 

22. An expression vector according to claim 21 comprising one or more 
elements selected from: a promoter-enhancer, a selection marker encoding sequence, 
an origin of replication, an epitope-tag encoding sequence or an affinity purification- 
tag encoding sequence. 

23. An expression vector according to claim 22 wherein the promoter- 
enhancer is the T7 promoter, gall promoter, metallothionein promoter, AraC 
promoter, or CMV promoter-enhancer. 

24. An expression vector according to claim 22 wherein the selection 
marker encoding sequence encodes a protein which imparts antibiotic resistance to 
cells. 

25. An expression vector according to claim 22 wherein the epitope-tag 
sequence encodes V5, the peptide Phe-His-His-Thr-Thr (SEQ ID NO:2). 
hemaglutinin, or glutathione-S-transferase. 

26. An expression vector according to claim 22 wherein the affinity 
purification-tag sequence encodes a polyamino acid tag or a polypeptide. 
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27. An expression vector according to claim 26 wherein said polyamino 
acid tag is polyhistidine. 

28. An expression vector according to claim 26 wherein said polypeptide 
is a chitin binding domain or glutathione-S-transferase. 

29. An expression vector according to claim 26 wherein said polypeptide 
encoding sequence includes an intein encoding sequence. 

30. An expression vector according to claim 21 wherein the expression 
vector is a eukaryotic expression vector or a prokaryotic expression vector. 

31. An expression vector according to claim 30 wherein the eukaryotic 
expression vector is pYES2, pMT, pIND, or pcDNA3. 1 . 

-A 

32. A method of obtaining full-length coding sequences comprising: 

(a) contacting full-length mRNA, isolated from a population of cells 
by employing an affinity purification material, with reverise transcriptasie, 
thereby synthesizing first strand cDNA and forming first strand cDNA/mRNA 
hybrids; 

(b) treating the first strand cDNA/mRNA hybrids with a substance 
that degrades single stranded RNA; 

(c) isolating imdegraded hybrid(s) from degraded hybrids employing 
an affinity purification material having affinity for capped mRNA; 

(d) denaturing the isolated cDNA/mRNA hybrids obtained from step 
(c) thereby producing single stranded cDNA and single stranded mRNA; 

(e) attaching a non-native tag sequence to the single-stranded cDNA, 
wherein the tag sequence comprises a site-specific recombination sequence 
and is attached by E. coli topoisomerasc HI; and 

(f) synthesizing second strand cDNA using the tagged cDNA as a 
template and/br amplifying the cDNA, wherein the amplification primers 
comprise an anti-coding sequence of the tag sequence (5*) and oligo-dT (3'). 
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33. A method according to claim 32 further comprising inserting the 
cDNA obtained in step (f) into and expression vector. 



34. 



A fiision protein comprising eIF4E and eIF4G. 
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